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ABSTRACT 

Students regularly are asked to solve Single Factor Analysis of Variance problems given only the 
sample summary statistics (number of observations per category, category means, and 
corresponding category standard deviations). Most undergraduate students today use Excel for 
data analysis of this type. However, Excel, like all other statistical software packages, requires an 
input data set in order to invoke its Anova: Single Factor procedure. The purpose of this paper is 
therefore to provide the student with an Excel macro that, given just the sample summary statistics 
as input, generates an equivalent underlying data set. This data set can then be used as the 
required input data set in Excel for Single Factor Analysis of Variance. 
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INTRODUCTION 



ost students have migrated to Excel for data analysis because of Excel’s pervasive accessibility. 
This means, given just the summary statistics for Analysis of Variance, the Excel user is either 
limited to solving the problem by hand or to solving the problem using an Excel Add-in. Both of 
these options have shortcomings. Solving by hand means there is no Excel counterpart solution that puts the entire 
answer ‘right there in front of the student’. Using an Excel Add-in is often not an option because, typically, Excel 
Add-ins are not available. The purpose of this paper, therefore, is to explain how to accomplish analysis of variance 
using an equivalent input data set and also to provide the Excel user with a straight-forward macro that accomplishes 
this technique. The technique works for Single Factor Analysis of Variance because, given a set of summary 
statistics (which are also the sufficient statistics in this instance), it is easy to verify the following two equations do 
generate an artificial data set with identical summary statistics (2) : 


V 

xln 

i = 1, 2,...,n-l 

y„ =n*x-(n-l)*y i 


where, x = sample mean, s = sample standard deviation, and n = number of category observations 

Single Factor Analysis Of Variance In Excel When The Actual Sample Data Set Is Available 

To begin with, assume that the required input data set is available with which to do an analysis of variance 
in Excel. An example input data set is shown below. 
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In Excel, the following sequence is invoked in order to generate the analysis of variance solution for the 
above data: Data (tab) - Analysis (panel on right) - click on Data Analysis (in Analysis panel) - select Anova: 
Single Factor on the Data Analysis screen - OK. At this point in the sequence, the Anova: Single Factor screen 
(below) is filled in [note: the first requirement is the input data set (in above cells B3:E7, in this case)]. 



When OK is clicked on the above screen, the following Anova: Single Factor solution is generated by 
Excel (note: since Excel is a ‘live’ spreadsheet, the default output in this case has been ‘cleaned up’ appropriately). 



A 

B 

C 

O 

I 

F 

0 

1 


Category 



2 

Qbs. 

A 

B 

C 

D 



3 

1 

7.6 

8.5 

68 

7.4 



4 

2 

3.3 

8.7 

6.7 

6.5 

Actual 


5 

3 

7.6 

7 7 

6.6 

6.8 

Data Set 


6 

4 

83 

6,4 




7 

5 

8.7 





8 

Anova. Single Factor 
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Single Factor Analysis Of Variance In Excel When The Actual Sample Data Set Is Not Available (Only 
Summary Statistics Are Available) 

Next, the Excel generated summary statistics above (Nj in cells B10:B13, X’s in cells D10:D13 and s’s in 
cells F10:F13) are used as input into the macro in order to generate the required equivalent input data set. 

The Excel macro is displayed below. After accessing the macro, the first screen, as shown immediately below, 
instructs you to Click Here for Data Entry Form (2) . 
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The first macro screen above indicates that the inputs are the number of observations per category (Nj), the 
category means ( X)’s, and the corresponding standard deviations (s’s) for each category. The Yi and Yn are the 
corresponding generated observations which are used to build the equivalent data set. Click Here for Data Entry 
Form moves the user to the second screen (shown below). 


Data Entry Form 


Enter Number of 
Variables 



xj 


QUIT 


On the second screen (shown above), you are asked to first enter the number of variables (= to categories or 
groups), which is 4 for the above example (A, B, C, and D). Then you click on Input Data and you begin with 
category A and, on separate screens, do the following: Enter Sample Size (Nj), Enter Sample Mean, and Enter 
Sample Standard Deviation. You do the same entry sequence for categories B, C, and D. When you have done this, 
you click on Generate Raw Data and the following information is displayed. [Note: the screens involved are clearly 
labeled and permit you to correct mistakes, if any]. After you select Generate Raw Data, the following results are 
displayed. 
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7.6331 


The above displayed values in C5:C8, E5:E8, and G5:G8 allow you to verify that you have entered 
correctly, for all four categories respectively, the number of observations (Nj), Means and Standard Deviations. The 
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values in 15:18 and K5:K8 display the values used to construct the equivalent data set. The constructed equivalent 
data set is displayed in columns M through P on the above screen. In addition, the equivalent data set shown above 
is also placed (by itself) on a separate Excel sheet labeled DATA. This is done in order to facilitate entering the 
equivalent data set into Excel (especially in the cases of large equivalent data sets). In this case, the equivalent data 
set on the DATA sheet appears as follows; 
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The above data set is then easily used as the required input data set in the standard way for the Anova: 
Single Factor procedure in Excel. The Anova: Single Factor screen in this case is simply filled in as follows (Note: 
the above equivalent data set is obviously an artificial data set. In our opinion this is an important property of the 
surrogate data set because it reinforces the idea that the summary statistics are the sufficient statistics in this case.); 



After OK is clicked on the above screen, the following Excel solution is generated; 
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As the reader can verify, the Excel solution above is identical to the prior Excel solution which was 
obtained using the actual underlying data set. This means the student, given just summary statistics, can follow the 
standard Excel steps in generating the Excel answer. He or she, therefore, has, among other things, ‘everything right 
there in front of them’ for confirmation purposes when working through this type of problem. In addition, and very 
importantly, once the Excel solution is obtained using the equivalent data set methodology, the Tukey pair wise 
comparisons (if warranted), for example, are easily added to the Excel output obtained using the equivalent data set. 

Practice Problem 

Golf balls were tested for hitting distance on a ball-driving machine. Four brands of golf balls were tested 
using random samples of seven golf balls of each brand. Summary statistics of the results are given below. In 
Excel, perform the correct Single Factor Analysis of Variance (use a = .05) in this case to determine if mean 
distances are equal for all four brands of golf balls. What is the decision on the null hypothesis in this case? 
Explain. 
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Hint: Fob = 6.9746; P-value(ob) = .0018. [The interested reader can contact the authors if they wish to learn how 
to add the Tukey pair wise comparisons efficiently to the Excel solution for this problem.) 
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NOTES 


The Excel macro is available from the authors upon request. Additionally, a complete set of instructions 
for implementing the macro are will also be included. To obtain the items contact: dlarson@usouthal.edu 
[Practitioners who use Excel and obtain their data only in summary form may, of course, also obtain the macro and 
the corresponding implementation instructions.] 
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